Microphone on controller with touchpad to take in audio swipe feature data

ABSTRACT

A game controller includes a touchpad that a user, viewing a virtual keyboard on a screen, can soft-touch to move a cursor on the screen and then hard-touch to move the cursor and also send location data to a processor for inputting a letter from the virtual keyboard. A microphone on the touchpad can be used to receive voice signals for training a machine learning module to predict a next letter or next word, or to insert special characters/punctuations/graphics such as “smileys” during the swipe, or to indicate a tone of a Chinese character while typing with Chinese Pinyin.

FIELD

The application relates generally to technically inventive, non-routinesolutions that are necessarily rooted in computer technology and thatproduce concrete technical improvements. In particular, the presentapplication relates to computer simulation controllers with touchpadinput.

BACKGROUND

Machine learning, sometimes referred to as deep learning, can be usedfor a variety of useful applications related to data understanding,detection, and/or classification.

SUMMARY

In computer simulation industries such as gaming industries, multipledata entry modes may exist that can benefit from machine learning toincrease precision and robustness.

Present principles thus provide a microphone on a touchpad of a computersimulation controller that can be used to receive voice signals fortraining a machine learning module to predict a next letter or nextword, or to insert special characters/punctuations/graphics such as“smileys” during the swipe, or to indicate a tone of an Asian wordcharacter such as a Chinese character while typing with Chinese Pinyin.

Accordingly, an apparatus includes at least one processor and at leastone computer storage that is not a transitory signal and that includesinstructions executable by the processor to receive a touch signal froma touch surface of a computer simulation controller to identify a firstalpha-numeric character. The instructions are executable to input thefirst alpha-numeric character to at least a first neural network (NN),and receive from the first NN a predicted sequence of alpha-numericcharacters including at least a first predicted alpha-numeric characterfor presentation on at least one display. The instructions also areexecutable to receive, from at least one microphone, input indicatingacceptance or rejection of at least the first predicted alpha-numericcharacter and provide the input from the microphone to the first NN totrain the first NN. The first NN may include plural long short-termmemory (LSTM) networks.

In example embodiments, the processor and microphone are embodied in thecomputer simulation controller. In other embodiments the processor maybe embodied in a computer simulation console configured forcommunicating with the computer simulation controller.

In some implementations, the instructions can be executable to identifyat least one punctuation symbol using the input from the microphone, andresponsive to identifying the punctuation symbol, present thepunctuation symbol on the display.

In some implementations, the instructions can be executable to identifyat least one tone using the input from the microphone, and responsive toidentifying the tone, identify for presentation on the display at leastone Chinese Pinyin character. In such implementations, the instructionsmay be executable to receive from the touch surface indication of atleast two Arabic letters. The instructions further may be executable toidentify, using the Arabic letters, at least first and second candidateChinese words, and responsive to identifying the tone, select the firstChinese word but not the second Chinese word.

In another aspect, an apparatus includes at least one processor and atleast one computer storage that is not a transitory signal and thatincludes instructions executable by the processor to identify at leastone tone using input from a microphone, and responsive to identifyingthe tone, identify for presentation on a display at least one Asianlanguage character.

In another aspect, an apparatus includes at least one processor and atleast one computer storage that is not a transitory signal and thatincludes instructions executable by the processor to identify at leastone punctuation symbol using input from a microphone, and responsive toidentifying the punctuation symbol, present the punctuation symbol on adisplay.

The details of the present application, both as to its structure andoperation, can best be understood in reference to the accompanyingdrawings, in which like reference numerals refer to like parts, and inwhich:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of an example system consistent with presentprinciples;

FIG. 2 is a perspective view of a computer simulation controller with amicrophone and a touch pad being used for inputting text presented on adisplay such as a TV or other audio video device communicating with thegame controller directly or via, e.g., a computer game console;

FIG. 3 is a schematic diagram illustrating a soft press and a hard presson the controller touch pad;

FIG. 4 is a flow chart of example logic consistent with presentprinciples related to FIG. 3;

FIG. 5 is a combination of a logic flow chart, data structures, andprocessing components consistent with present principles;

FIGS. 6-8B are schematic diagrams of a data structure referred to as theheat map in FIG. 5, illustrating steps in use;

FIG. 9 is a block diagram of an example neural network (NN) configuredas plural long short-term memory (LSTM) networks for outputting apredicted next word based on current user input;

FIGS. 10-12 are schematic diagrams illustrating operation of the NN inFIG. 9 post-training;

FIG. 13 is a flow chart of example overall logic consistent with presentprinciples;

FIG. 14 is a schematic view of a system in which the microphone is usedto input ground training to the neural networks contemporaneous withoperation;

FIG. 15 is a schematic view of a system in which the microphone is usedto input tones of Chinese characters to the neural networkscontemporaneous with operation;

FIG. 16 is a schematic view of a system in which the microphone is usedto input punctuation or graphics contemporaneous with operation; and

FIG. 17 is a flow chart of example logic consistent with FIG. 15.

DETAILED DESCRIPTION

Now referring to FIG. 1, this disclosure relates generally to computerecosystems including aspects of computer networks that may includeconsumer electronics (CE) devices. A system herein may include serverand client components, connected over a network such that data may beexchanged between the client and server components. The clientcomponents may include one or more computing devices including portabletelevisions (e.g. smart TVs, Internet-enabled TVs), portable computerssuch as laptops and tablet computers, and other mobile devices includingsmart phones and additional examples discussed below. These clientdevices may operate with a variety of operating environments. Forexample, some of the client computers may employ, as examples, operatingsystems from Microsoft, or a Unix operating system, or operating systemsproduced by Apple Computer or Google. These operating environments maybe used to execute one or more browsing programs, such as a browser madeby Microsoft or Google or Mozilla or other browser program that canaccess websites hosted by the Internet servers discussed below.

Servers and/or gateways may include one or more processors executinginstructions that configure the servers to receive and transmit dataover a network such as the Internet. Or, a client and server can beconnected over a local intranet or a virtual private network. A serveror controller may be instantiated by a game console such as a SonyPlayStation®, a personal computer, etc.

Information may be exchanged over a network between the clients andservers. To this end and for security, servers and/or clients caninclude firewalls, load balancers, temporary storages, and proxies, andother network infrastructure for reliability and security.

As used herein, instructions refer to computer-implemented steps forprocessing information in the system. Instructions can be implemented insoftware, firmware or hardware and include any type of programmed stepundertaken by components of the system.

A processor may be any conventional general-purpose single- ormulti-chip processor that can execute logic by means of various linessuch as address lines, data lines, and control lines and registers andshift registers.

Software modules described by way of the flow charts and user interfacesherein can include various sub-routines, procedures, etc. Withoutlimiting the disclosure, logic stated to be executed by a particularmodule can be redistributed to other software modules and/or combinedtogether in a single module and/or made available in a shareablelibrary. While flow chart format may be used, it is to be understoodthat software may be implemented as a state machine or other logicalmethod.

Present principles described herein can be implemented as hardware,software, firmware, or combinations thereof; hence, illustrativecomponents, blocks, modules, circuits, and steps are set forth in termsof their functionality.

Further to what has been alluded to above, logical blocks, modules, andcircuits described below can be implemented or performed with ageneral-purpose processor, a digital signal processor (DSP), a fieldprogrammable gate array (FPGA) or other programmable logic device suchas an application specific integrated circuit (ASIC), discrete gate ortransistor logic, discrete hardware components, or any combinationthereof designed to perform the functions described herein. A processorcan be implemented by a controller or state machine or a combination ofcomputing devices.

The functions and methods described below, when implemented in software,can be written in an appropriate language such as but not limited to C#or C++, and can be stored on or transmitted through a computer-readablestorage medium such as a random access memory (RAM), read-only memory(ROM), electrically erasable programmable read-only memory (EEPROM),compact disk read-only memory (CD-ROM) or other optical disk storagesuch as digital versatile disc (DVD), magnetic disk storage or othermagnetic storage devices including removable thumb drives, etc. Aconnection may establish a computer-readable medium. Such connectionscan include, as examples, hard-wired cables including fiber optics andcoaxial wires and digital subscriber line (DSL) and twisted pair wires.

Components included in one embodiment can be used in other embodimentsin any appropriate combination. For example, any of the variouscomponents described herein and/or depicted in the Figures may becombined, interchanged or excluded from other embodiments.

“A system having at least one of A, B, and C” (likewise “a system havingat least one of A, B, or C” and “a system having at least one of A, B,C”) includes systems that have A alone, B alone, C alone, A and Btogether, A and C together, B and C together, and/or A, B, and Ctogether, etc.

Now specifically referring to FIG. 1, an example system 10 is shown,which may include one or more of the example devices mentioned above anddescribed further below in accordance with present principles. Note thatcomputerized devices described in all of the figures herein may includesome or all of the components set forth for various devices in FIG. 1.

The first of the example devices included in the system 10 is a consumerelectronics (CE) device configured as an example primary display device,and in the embodiment shown is an audio video display device (AVDD) 12such as but not limited to an Internet-enabled TV with a TV tuner(equivalently, set top box controlling a TV). The AVDD 12 may be anAndroid®-based system. The AVDD 12 alternatively may also be acomputerized Internet enabled (“smart”) telephone, a tablet computer, anotebook computer, a wearable computerized device such as e.g.computerized Internet-enabled watch, a computerized Internet-enabledbracelet, other computerized Internet-enabled devices, a computerizedInternet-enabled music player, computerized Internet-enabled headphones, a computerized Internet-enabled implantable device such as animplantable skin device, etc. Regardless, it is to be understood thatthe AVDD 12 and/or other computers described herein is configured toundertake present principles (e.g. communicate with other CE devices toundertake present principles, execute the logic described herein, andperform any other functions and/or operations described herein).

Accordingly, to undertake such principles the AVDD 12 can be establishedby some or all of the components shown in FIG. 1. For example, the AVDD12 can include one or more displays 14 that may be implemented by a highdefinition or ultra-high definition “4K” or higher flat screen and thatmay or may not be touch-enabled for receiving user input signals viatouches on the display. The AVDD 12 may also include one or morespeakers 16 for outputting audio in accordance with present principles,and at least one additional input device 18 such as e.g. an audioreceiver/microphone for e.g. entering audible commands to the AVDD 12 tocontrol the AVDD 12. The example AVDD 12 may further include one or morenetwork interfaces 20 for communication over at least one network 22such as the Internet, an WAN, an LAN, a PAN etc. under control of one ormore processors 24. Thus, the interface 20 may be, without limitation, aWi-Fi transceiver, which is an example of a wireless computer networkinterface, such as but not limited to a mesh network transceiver. Theinterface 20 may be, without limitation a Bluetooth transceiver, Zigbeetransceiver, IrDA transceiver, Wireless USB transceiver, wired USB,wired LAN, Powerline or MoCA. It is to be understood that the processor24 controls the AVDD 12 to undertake present principles, including theother elements of the AVDD 12 described herein such as e.g. controllingthe display 14 to present images thereon and receiving input therefrom.Furthermore, note the network interface 20 may be, e.g., a wired orwireless modem or router, or other appropriate interface such as, e.g.,a wireless telephony transceiver, or Wi-Fi transceiver as mentionedabove, etc.

In addition to the foregoing, the AVDD 12 may also include one or moreinput ports 26 such as, e.g., a high definition multimedia interface(HDMI) port or a USB port to physically connect (e.g. using a wiredconnection) to another CE device and/or a headphone port to connectheadphones to the AVDD 12 for presentation of audio from the AVDD 12 toa user through the headphones. For example, the input port 26 may beconnected via wire or wirelessly to a cable or satellite source 26 a ofaudio video content. Thus, the source 26 a may be, e.g., a separate orintegrated set top box, or a satellite receiver. Or, the source 26 a maybe a game console or disk player.

The AVDD 12 may further include one or more computer memories 28 such asdisk-based or solid-state storage that are not transitory signals, insome cases embodied in the chassis of the AVDD as standalone devices oras a personal video recording device (PVR) or video disk player eitherinternal or external to the chassis of the AVDD for playing back AVprograms or as removable memory media. Also, in some embodiments, theAVDD 12 can include a position or location receiver such as but notlimited to a cellphone receiver, GPS receiver and/or altimeter 30 thatis configured to e.g. receive geographic position information from atleast one satellite or cellphone tower and provide the information tothe processor 24 and/or determine an altitude at which the AVDD 12 isdisposed in conjunction with the processor 24. However, it is to beunderstood that that another suitable position receiver other than acellphone receiver, GPS receiver and/or altimeter may be used inaccordance with present principles to e.g. determine the location of theAVDD 12 in e.g. all three dimensions.

Continuing the description of the AVDD 12, in some embodiments the AVDD12 may include one or more cameras 32 that may be, e.g., a thermalimaging camera, a digital camera such as a webcam, and/or a cameraintegrated into the AVDD 12 and controllable by the processor 24 togather pictures/images and/or video in accordance with presentprinciples.

Also included on the AVDD 12 may be a Bluetooth transceiver 34 and otherNear Field Communication (NFC) element 36 for communication with otherdevices using Bluetooth and/or NFC technology, respectively. An exampleNFC element can be a radio frequency identification (RFID) element.

Further still, the AVDD 12 may include one or more auxiliary sensors 38(e.g., a motion sensor such as an accelerometer, gyroscope, cyclometer,or a magnetic sensor, an infrared (IR) sensor for receiving IR commandsfrom a remote control, an optical sensor, a speed and/or cadence sensor,a gesture sensor (e.g. for sensing gesture command), etc.) providinginput to the processor 24. The AVDD 12 may include an over-the-air TVbroadcast port 40 for receiving OTA TV broadcasts providing input to theprocessor 24. In addition to the foregoing, it is noted that the AVDD 12may also include an infrared (IR) transmitter and/or IR receiver and/orIR transceiver 42 such as an IR data association (IRDA) device. Abattery (not shown) may be provided for powering the AVDD 12.

Still further, in some embodiments the AVDD 12 may include a graphicsprocessing unit (GPU) 44 and/or a field-programmable gate array (FPGA)46. The GPU and/or FPGA may be utilized by the AVDD 12 for, e.g.,artificial intelligence processing such as training neural networks andperforming the operations (e.g., inferences) of neural networks inaccordance with present principles. However, note that the processor 24may also be used for artificial intelligence processing such as wherethe processor 24 might be a central processing unit (CPU).

Still referring to FIG. 1, in addition to the AVDD 12, the system 10 mayinclude one or more other computer device types that may include some orall of the components shown for the AVDD 12. In one example, a firstdevice 48 and a second device 50 are shown and may include similarcomponents as some or all of the components of the AVDD 12. Fewer orgreater devices may be used than shown.

The system 10 also may include one or more servers 52. A server 52 mayinclude at least one server processor 54, at least one computer memory56 such as disk-based or solid state storage, and at least one networkinterface 58 that, under control of the server processor 54, allows forcommunication with the other devices of FIG. 1 over the network 22, andindeed may facilitate communication between servers, controllers, andclient devices in accordance with present principles. Note that thenetwork interface 58 may be, e.g., a wired or wireless modem or router,Wi-Fi transceiver, or other appropriate interface such as, e.g., awireless telephony transceiver.

Accordingly, in some embodiments the server 52 may be an Internet serverand may include and perform “cloud” functions such that the devices ofthe system 10 may access a “cloud” environment via the server 52 inexample embodiments. Or, the server 52 may be implemented by a gameconsole or other computer in the same room as the other devices shown inFIG. 1 or nearby.

The devices described below may incorporate some or all of the elementsdescribed above.

The methods described herein may be implemented as software instructionsexecuted by a processor, suitably configured application specificintegrated circuits (ASIC) or field programmable gate array (FPGA)modules, or any other convenient manner as would be appreciated by thoseskilled in those art. Where employed, the software instructions may beembodied in a non-transitory device such as a CD ROM or Flash drive. Thesoftware code instructions may alternatively be embodied in a transitoryarrangement such as a radio or optical signal, or via a download overthe Internet.

FIG. 2 illustrates a system 200 the components of which may incorporateappropriate components shown in FIG. 1. A computer simulation controller202 such as a PlayStation® controller, Xbox® controller, or othercontroller may include a touchpad 204 that can receive touch signalsfrom a hand 206 and communicate via wired and/or wireless paths 208 witha computer simulation console 210 and/or a display device 212 such as anInternet-enabled TV. As explained further below, the user can manipulatethe touchpad 204 to generate alpha-numeric characters 214 forpresentation on the display device 212 either through directcommunication of signals with the display device or through thesimulation console 210. More specifically, by manipulating the touchpad204, a user can move a screen cursor over a letter on a virtual keyboard216 presented on the display device 212 to enter the alpha-numericcharacters 214. The virtual keyboard 216 may have, without limitation, aQWERTY layout.

Additionally, the controller 200 may include one or more microphones 218communicating with the processor of the controller for purposesdisclosed below. In the example shown, the microphone 218 is provided onthe touchpad 204, although it is to be understood that the microphone218 may be provided elsewhere on the housing of the controller 200 orindeed on another component if desired.

As shown schematically in FIG. 3, present principles contemplate twotypes of touch, namely, a “soft” press 300 (using a soft pressure on thetouchpad or a hover over the touchpad with zero pressure), in which ascreen cursor on the display device 212 is moved to desired locations onthe virtual keyboard 216 without sending location data (i.e., a signalindicating selection of any particular virtual key) to the displaydevice, and a “hard” press 302 of greater pressure than a soft press, inresponse to which a screen cursor on the display device 212 may be movedand location data sent to the display device to indicate selection of avirtual key. In this way, a user can look away from the touchpad 204 andview the virtual keyboard 216 while moving his or her finger across thetouchpad to move a visible screen cursor to a desired letter on thevirtual keyboard, and then exert a hard press to select that letter.Note that an individual “next” letter may not be presented on thedisplay, but rather the next “most possible word” may be displayed aftera user has finished a “swipe”. The “hottest” key (based on the heatmap)may be highlighted on the virtual keyboard as well as the trace. Inaddition, a “swipe” is defined as a continuous hard-press which forms atrace.

FIG. 4 illustrates example logic with the above description in mind. Thelogic may be executed by one or more of a processor in the simulationcontroller 202, a processor in the simulation console 210, and aprocessor in the display device 212.

Commencing at state 400 it is determined whether a press of the touchpad204 has been received. This may be done by determining whether signalsfrom one or more proximity sensors associated with the touchpad 204indicate a hover of a finger adjacent the touchpad 204 and/or bydetermining whether signals from one or more pressure sensors associatedwith the touchpad 204 indicate a pressure of at least a first thresholdpressure.

When it is determined that a touch has been received, the logic proceedsto state 402 to determine whether the touch is a soft press or hardpress as indicated by, e.g., signals from a pressure sensor associatedwith the touchpad 204 indicating a touch of at least a thresholdpressure, which is typically set to be greater than any thresholdpressure used at state 400. If the touch does not satisfy the threshold,the logic moves to block 404 to return a soft press. In someimplementations the logic may proceed to state 406 to determine whetherthe soft press is the first soft press within, e.g., a threshold ofperiod, for example within the last five minutes, and if so the logiccan move to block 408 to enlarge an image of the virtual keyboard 216 onthe display device 212. In any case, from state 406 if the test there isnegative or from block 408, the logic moves to block 410 to move thescreen cursor without sending press location information.

On the other hand, if the test at state 402 determines that a hard pressis received, such is returned at block 412, and the screen may be movedaccording to the touch with location information being sent as wellindicating the location of the virtual keyboard the user has selected bymeans of the hard press on the touchpad 204 of the simulation controller200.

FIG. 5 illustrates a combination of hardware and software blocks alludedto above.

One or more proximity and/or pressure sensors 500 are provided in thetouchpad 204 to output signals representing soft presses 502 and hardpresses 504. The soft presses 502 establish finger focus points 506. Thehard presses 504 establish points on the touchpad as detected by thesensor(s) 500. A soft-press represents a cursor focus point, while“points by sensor” means “continuous points sending by the sensor”.

At 510 a heatmap algorithm, discussed further below in reference toFIGS. 6-8, is accessed to output a sequence of letters 512 according tothe hard presses 504. The sequence of letters 512 is input along with adictionary 514 to a reduction block 516 that reduces the list ofcandidates that might possibly form either a correction to or acompletion of the sequence of letters 512. The dictionary 514 isessentially a dictionary and/or thesaurus of sequences of letters thatcan be used to correct a mis-typed word, e.g., the dictionary 514 maycorrelate “thw” to “the” to return the word “the” in response to inputof “thw”.

The reduced list of candidates 516 is provided to a module 518 thatoutputs a predicted network or words for presentation on the screen,which a user can then select to complete his or her desired inputwithout typing every letter of the predicted word or words. The module518 may be established by one or more neural networks (NN) as describedfurther below. To produce a predicted word or words, the module 518 mayreceive input from a contextual user block 520, which provides previousword strings employed by the user with the current input inferred topossibly be a repeat of a prior input, e.g., “do you” may have beenfollowed multiple times in prior inputs by “know what I mean”, and thisinformation can be input to help train and execute the module 518.

Moreover, similar training/execution aids may be input to the module 518as shown at the right of FIG. 5. Specifically, queries and chat data 522from other computer gamers may be input to a character-based NN such asa bidirectional long short-term memory (BILSTM) 524 to learn patterns ofcommon input strings for provision to a machine learning charactersequence model 526. This model 526 may be input to or accessed by themodule 518 in rendering a next predicted word or words.

FIGS. 6-8B illustrate employment of the heatmap algorithm 510 in FIG. 5.Basically, the “path” or “connected points” of the finger “swipe”(hard-press) and the probabilities of each letter are “discounted andaccumulated” at certain time interval along the swipe. At each timeinterval, the letter with the highest probability is extracted, whichmay also have to pass a certain threshold to add to the sequence asdeveloped further below.

In FIGS. 6-8B, it is to be understood that only the first four lettersin the top left corner of a QWERTY keyboard (i.e., Q, next to which is“W”, and below which from left to right are “A” and “S”) are shown forclarity of disclosure, as but one example of a possible virtual keyboardlayout for the virtual keyboard 216. In the example heatmap 510illustrated, each area of the heatmap for a particular letter is dividedinto a three-by-three grid for nine divisions (illustrated as geometricsquares) total, with the center division 600 for a particular letterindicating that the probability of that letter being desired when acursor is in the center area being 1. In contrast, the heatmap 510indicates probabilities less than one but greater than zero in theborder divisions 602 that surround the center division 600 of a letter,with the probabilities being associated with the letter of the centerdivision 600 and the letter(s) immediately adjacent the border divisions602 (or, in the case of a border division that is not adjacent anotherletter, only a probability less than one for the letter of the centerdivision).

As shown in FIG. 7 at 700, a soft press is used to locate the startingletter of an intended input. Then, as shown at 800 in FIG. 8, a hardpress is used to indicate selection of the starting letter, in theexample shown, “Q”. This causes the collection of data that “Q” isselected with a probability of one and that surrounding letters (in theexample shown, “W”, “A”, and “S”) are not selected, i.e., have aprobability of zero.

FIGS. 8A and 8B illustrate the results of an ensuing swipe. In FIG. 8A aswipe is shown at 802 from the location starting in FIG. 8 to thelocation 804 indicated by the image of the hand. Here, the user hasmoved his finger toward the letter “A”. This causes new heatmapstatistics to be aggregated according to the path of the swipe over theborder divisions 602 using the algorithm shown in FIG. 8A. Because theprobability of “Q” is higher than the probabilities of “W” (which iszero), “A” (which is 0.3), and “S” (which is zero), the sequence returns“Q”.

FIG. 8B shows at 806 that the swipe has been continued to the location808 shown by the image of the hand. This causes further heatmapstatistics to be aggregated according to the path of the swipe over theborder divisions 602 using the algorithm shown in FIG. 8B. Because theprobability of “A” is higher than the probabilities of “W” (which iszero), “Q” (which is 0.3), and “S” (which is zero), the sequence returns“A” to be appended after “Q” was returned in FIG. 8A, resulting in asequence “QA”.

Thus, it may now be appreciated that the “path” or “connected points” ofthe finger “swipe” (hard-press) is tracked and the probabilities of eachletter are discounted and accumulated at certain time intervals alongthe swipe. At each time interval, the letter with the highestprobability is extracted, in some embodiments provided the probabilityof the letter satisfies a threshold probability (e.g., of 0.4) to beadded to the sequence.

FIG. 9 illustrates an example NN architecture that may be used in any ofthe NN-based modules of, e.g., FIG. 5. A network 900 of NN may receiveinput letters 902 with probabilities 904 from the heatmap to outputtime-distributed predicted letters 906 with associated probabilities908. In the example shown, each letter 902 may be input to a respectiverecurrent NN (RNN) such as a sequence of long short-term memory (LSTM)910 as shown. An LSTM 910 as shown at the right in FIG. 9 may include aninput gate 912, a forget gate 914, and an output gate 916, all of whichmay execute a sigmoid function as indicated by the Greek letter a inFIG. 9. The input gate 912 controls the extent to which a new valueflows into the cell, the forget gate 914 controls the extent to which avalue remains in the cell and the output gate 916 controls the extent towhich the value in the cell is used to compute the output activation ofthe LSTM unit.

The current value x_(i) being input and the hidden state h_(t-1) fromthe previous iteration are input to all three gates as shown. The outputof the sigmoid function of the input gate 912 may be combined with ahyperbolic tangent function 918 at a first combine operator 920, whichmay be an element-wise product. The output of the first combine operator920 is combined, as by summing if desired, with the output of s secondcombine operator 922 at a third combine operator 924. The output of thethird combine operator 924 may be fed back to the second combineoperator 922 for combining with the output of the forget gate 914.Further, the output of the third combine operator 924 may be operated onif desired by a hyperbolic tangent function 926 and then combined at afourth combine operator 928 with the output of the output gate 916 torender a hidden state vector 930 for use in the succeeding iteration.

FIGS. 10-12 illustrate a sequence of the use of the network 900 togenerate predicted text. The lower row of letters 1000 represents inputreceived from hard presses on keys of the virtual keyboard 216 and/orfrom selection of previously predicted letters and/or words. These areinput to the trained network 900. Using probabilities correlated withletters from the heatmap as illustrated at 1002, a next predicted letter1004 is generated and fed back to the model. The sequence shown in FIGS.10-12 generated predicted letters for an initial input of “play” thatresults in the word “PlayStation”.

FIG. 13 is a flow chart of example logic consistent with presentprinciples. The NN system(s) described herein are trained at block 1300.Moving to block 1302, a hard press is received on the touchpad and aletter established based thereon at block 1304 using the heatmap ifdesired. The letter is input to the NN system at block 1306, whichoutputs a predicted letter or words or string of words at block 1308.The predicted letters/words are presented on screen at block 1310.

If a user does not accept the predictions at state 1312, they may beremoved from presentation at state 1314. Otherwise, accepted predictionsare confirmed at block 1316 and presented in sequence after the lettersestablished by the hard press.

Present principles may be used in all possible deep learning-basedmethods for image, video and audio data processing, among others.

Note that a user can indicate acceptance at state 1312 by speaking intothe microphone 218 illustrated in FIG. 2. For example, the user mayspeak “OK” to accept the predicted word or “not right” or equivalent toreject it. This input is provided to the neural networks describedherein as ground truth data for training the neural networks.

FIG. 14 illustrates using the components discussed above in relation toFIG. 2. A predicted word 1400 has been presented on the display 212according to description above. A user 1402 may speak into themicrophone 218 as indicated at 1404 whether to accept or reject thepredicted word 1400 contemporaneous with operation. In the exampleshown, the predicted word is “pony”, and the user 1402 has rejected itby speaking “no” into the microphone 218, which is digitized andprovided as training data to the neural networks. The user 1402 hasfurther input a correct word, in this case, “cow”, which also isprovided as ground truth for training the neural networks.

FIG. 15 illustrates a second use case for the microphone 218. In FIG.15, assume that the user has typed in or the neural networks havepredicted one or more Asian language characters such as Chinese Pinyincharacters 1500 for presentation on the display 212. For illustration,the characters are simply numbered 1-4 in FIG. 15, and correspond tospoken words that may be rendered in English using identical Arabicletters, in the example shown, the letters “ma”, and thus may beindistinguishable from each other when so rendered in English. Beloweach character 1500 the word 1502 it corresponds to in English is shownand below that, a respective symbol 1504 for the corresponding tone, itbeing understood that in implementation neither the word 1502 nor symbol1504 may be presented. Note that while Chinese pinyin is used as anexample of Asian language characters, present principles apply to otherAsian languages such as Japanese in which the same Arabic letter stringmay be translated into two or more Asian language words differentiatedfrom each other by tonal differences and not consonant or voweldifferences.

A user 1506 may speak a word including an intended Asian language tone1508 into the microphone 218, which is input to the processor(s) hereinas the correct or ground truth tone. In the example shown, the user 1506has spoken the word using the tone corresponding to the third character1500, giving the character the meaning “horse”. In this way, the usermay have input, via the controller 200, the Arabic letters “ma” asindicated at 1510. The Arabic letters 1510 may be correlated to pluralcandidate Chinese words, which may be presented as respective Asianlanguage characters if desired on the display 212. Tonal input from themicrophone 218 is used to confirm and/or select which of the candidatecharacters words the user intended by typing in “ma”, which may then bepresented on the display in lieu of the other candidate words/symbols.

FIG. 16 illustrates that during a touchpad swipe as described above,input from the microphone 218 may be used to input specialcharacters/punctuations/graphics such as “smileys”. Assume that the userhas input, or the neural networks have predicted, a word 1600, in theexample shown, the word “yes”, that appears on the display 212. A user1602 contemporaneously may speak the word 1600 with a tone that isdetected by the neural networks to correspond to excitement as indicatedat 1604, to cause an exclamation point 1606 to appear after the word1600.

Or, the user may utter, as indicated at 1608, the name of the desiredsymbol or punctuation to cause the uttered symbol to be presented on thedisplay 212. Yet again, the user may utter, as indicated at 1610, thename of a desired graphic symbol such as “smiley”, to cause the utteredgraphic symbol to be presented at 1612 on the display 212.

FIG. 17 illustrates example logic consistent with FIG. 15. Commencing atblock 1700, input may be received, e.g., from the touch pad of thecontroller or from other manipulable device, indicating Arabic lettersand/or Chinese character(s). Block 1702 indicates that such input may beambiguous in that it may correlate to more than one candidate Asianlanguage word with corresponding symbol, in which case the logic flowsto block 1704.

For instance, and using Chinese as an example, when input is received ofArabic letters, such as the letters “ma” discussed previously, more thanone pinyin symbol (corresponding to more than one Chinese word) may be acandidate for the user's intent. Or, when touch input is received on thetouch pad attempting to render a Chinese character, owing to imprecisionin the touch tracing more than one Chinese character might beimplicated.

At block 1704, using a dictionary (for example, correlating Arabic “ma”to the four pinyin characters described in reference to FIG. 15) thecandidate pinyin symbols/corresponding Chinese words may be identified.In addition, or alternatively, handwriting recognition may be employedin the case of, e.g., an attempted trace of a Chinese character usingthe touch pad, to identify candidate Chinese characters that may fulfillthe user's intent.

Proceeding to block 1706, a user may be prompted to speak to resolve theambiguity and essentially select the user-preferred candidatesymbol/word from block 1704. FIG. 15 illustrates a non-limiting examplescreen shot showing such a prompt.

It will be appreciated that whilst present principals have beendescribed with reference to some example embodiments, these are notintended to be limiting, and that various alternative arrangements maybe used to implement the subject matter claimed herein.

1. An apparatus, comprising: at least one processor configured withinstructions to: receive a touch signal from a touch surface of acomputer simulation controller to identify at least two Arabic letters;input the Arabic letters to at least a first neural network (NN);receive, from at least one microphone, input; and identify, using theArabic letters, at least first and second candidate Chinese words; andresponsive to identifying a tone in the input from the microphone,select the first Chinese word but not the second Chinese word.
 2. Theapparatus of claim 1, wherein the processor and microphone are embodiedin the computer simulation controller.
 3. The apparatus of claim 1,wherein the processor is embodied in a computer simulation consoleconfigured for communicating with the computer simulation controller. 4.The apparatus of claim 1, wherein the instructions are executable to:identify at least one punctuation symbol using the input from themicrophone; and responsive to identifying the punctuation symbol,present the punctuation symbol on the display.
 5. The apparatus of claim1, wherein the instructions are executable to: identify at least onetone using the input from the microphone; and responsive to identifyingthe tone, identify for presentation on the display at least one ChinesePinyin character.
 6. (canceled)
 7. The apparatus of claim 1, wherein thefirst NN comprises plural long short-term memory (LSTM) networks.
 8. Anapparatus, comprising: at least one processor; and at least one computerstorage that is not a transitory signal and that comprises instructionsexecutable by the at least one processor to: identify at least one toneusing input from a microphone; and responsive to identifying the tone,identify for presentation on a display at least one Asian languagecharacter.
 9. The apparatus of claim 8, wherein the instructions areexecutable to: receive from a touch surface indication of at least twoArabic letters; identify, using the Arabic letters, at least first andsecond candidate Asian language words; and responsive to identifying thetone, select the first Asian language word but not the second Asianlanguage word.
 10. The apparatus of claim 8, wherein the instructionsare executable to: receive a touch signal from the touch surface, thetouch surface being part of a computer simulation controller, toidentify a first alpha-numeric character; input the first alpha-numericcharacter to at least a first neural network (NN); responsive to theinput of the first alpha-numeric character, receive from the first NN apredicted sequence of alpha-numeric characters comprising at least afirst predicted alpha-numeric character for presentation on at least onedisplay; receive, from the microphone, input indicating acceptance orrejection of at least the first predicted alpha-numeric character; andprovide the input from the microphone to the first NN to train the firstNN.
 11. The apparatus of claim 8, wherein the processor and microphoneare embodied in a computer simulation controller.
 12. The apparatus ofclaim 8, wherein the processor is embodied in a computer simulationconsole configured for communicating with a computer simulationcontroller.
 13. The apparatus of claim 8, wherein the instructions areexecutable to: identify at least one punctuation symbol using the inputfrom the microphone; and responsive to identifying the punctuationsymbol, present the punctuation symbol on the display.
 14. The apparatusof claim 10, wherein the first NN comprises plural long short-termmemory (LSTM) networks.
 15. An apparatus, comprising: at least oneprocessor configured with code to: receive from a touch surfaceindication of at least two Arabic letters; identify, using the Arabicletters, at least first and second Chinese words; and responsive toidentifying a tone received in at least one signal from a microphone,select the first Chinese word but not the second Chinese word.
 16. Theapparatus of claim 15, wherein the instructions are executable to:identify at least one tone using input from the microphone; andresponsive to identifying the tone, identify for presentation on thedisplay at least one Chinese Pinyin character.
 17. (canceled)
 18. Theapparatus of claim 15, wherein the instructions are executable to:receive a touch signal from a touch surface, the touch surface beingpart of a computer simulation controller, to identify a firstalpha-numeric character; input the first alpha-numeric character to atleast a first neural network (NN); responsive to the input of the firstalpha-numeric character, receive from the first NN a predicted sequenceof alpha-numeric characters comprising at least a first predictedalpha-numeric character for presentation on at least one display;receive, from the microphone, input indicating acceptance or rejectionof at least the first predicted alpha-numeric character; and provide theinput from the microphone to the first NN to train the first NN.
 19. Theapparatus of claim 15, wherein the processor and microphone are embodiedin a computer simulation controller.
 20. The apparatus of claim 18,wherein the first NN comprises plural long short-term memory (LSTM)networks.